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Abstract 

Increasing evidence stiows tliat complex diseases are caused by both common and rare variants. Recently, several 
statistical methods for detecting associations of rare variants have been developed, including the test for testing 
the effect of an optimally weighted combination of variants (TOW) developed by our group in 2012. These 
methodologies consider phenotype measurement at only one time point. Because many sequence data have been 
developed on population cohorts that contain phenotype measurements at multiple time points, such as the data 
set provided in the Genetic Analysis Workshop 18 (GAW18), we extend TOW from phenotype measurement at one 
time point to phenotype measurements at multiple time points. We then apply the newly proposed method to 
the GAW18 data set and compare the power of the new method with TOW using only one phenotype 
measurement. The application results show that the newly proposed method jointly modeling phenotype 
measurements at all time points has increased power over TOW. 



Background 

There is increasing interest in detecting associations 
between rare variants and complex traits. Although statis- 
tical methods to detect common variant associations have 
been well developed, these variant-by-variant methods 
may not be optimal for detecting associations of rare var- 
iants as a result of allelic heterogeneity as well as the 
extreme rarity of individual variants [1]. Recently, several 
statistical methods for detecting associations of rare var- 
iants have been developed, including the cohort allelic 
sums test [2], the combined multivariate and collapsing 
method [1], the weighted sum statistic [3], and the variable 
minor allele frequency threshold method [4], among 
others. These methods are essentially testing the effect of 
a weighted combination of variants. Thus, choosing appro- 
priate weights is critical to the performance of these meth- 
ods. In Sha et al [5], we proposed a novel test for testing 
the effect of an optimally weighted combination of variants 
(TOW). The optimal weights are analytically derived. 
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Based on the optimal weights, TOW tests the effect of a 
weighted combination of variants. Simulation studies 
showed that TOW performed better than the existing 
methods across a wide range of scenarios. Aforementioned 
methods are for phenotypes at a single time point and 
cannot be applied to longitudinal phenotypes directly. 

Meanwhile, quite a few statistical methods on the ana- 
lysis of longitudinal data in the context of genetic map- 
ping and association studies have been developed for 
common variants [6-10]. A typical method is functional 
mapping, which uses mathematical models to connect 
the actions of genes and the development of a trait. Sev- 
eral mathematical functions have been established to 
describe the development of a phenotype, including para- 
metric functions [6], semiparametric functions [8], and 
nonparametric functions [9]. From a statistical stand- 
point, any modeling using longitudinal phenotypes is 
more informative than that using phenotypes at a single 
time point and thus can increase power to test associa- 
tion [7,10]. Functional mapping capitalizes on the full 
information provided by growth and development of 
phenotypes over time, increasing the power of gene 
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identification. However, no statistical methods on the 
analysis of longitudinal data are available for rare 
variants. 

To analyze the sequencing data with phenotype mea- 
surements at multiple time points provided by Genetic 
Analysis Workshop 18 (GAW18) [11], in this article, we 
propose a novel method to test rare-variant association 
with longitudinal phenotypes by extending our pre- 
viously proposed method, TOW. Applying the proposed 
method to the GAW18 data set, we compare the power 
of the proposed method with TOW using only one phe- 
notype measurement. 

Methods 

Consider a random sample of n individuals. Each indivi- 
dual has been genotyped at M variants in a genomic 
region (a gene or a pathway). Denote [xn, ...,XiM) as the 
genotypic score of the ith individual, where 
Xim e {0, 1, 2} is the number of minor alleles. Let 

M 

Xi = ^ WmXim denote the weighted combination of gen- 

m=l 

otypic scores at the M variants, where W\, ... ,Wm are 
unknown constants and their values are determined 
later using some optimal criteria. For longitudinal data, 
we assume that phenotypes and covariates are collected 
at K time points. Let Yij and Zy = (Zyi, . . . ,ZijpY denote 
the trait values and the covariates of the ith individual 
at the /th time point. For longitudinal data, we propose 
a mixed linear model to model the relationship between 
phenotype, covariates, and genotypic scores: 



Yt = Zja + XtP + Ct, 



(3) 



■■ Zj:a + fixi + Vij + Bij, 



where y = (yii,...,)'iK,--- ,y„i,-..,)'„K)', x= (xi,...,xi,--- ,xn,...,xnf, vis 
the vector form of % and e is the vector form of We 
assume that e follows normal distribution n[o, cr^i) and v also 
follows normal distribution n(o, a^D), where d = diag{Da, . . . , Do) 
and Do depends on the level of correlation of phenotypes 
between time points. The total variance of y is s = a^o + a^i. 
Following Furlotte et al [10], we use sample correlation 
coefficients of phenotypes between time points to estimate 
Do. For variance components rr^ and a^, we use maximum 
likelihood estimates (MLEs) under null hypothesis Ho : = o 
as estimates of and <t/ and impute the estimated values of 

and into model (1). Let and denote the MLEs 
under null hypothesis of and a^, and let t = a^D + a^i. After 
imputing the estimated values of and n^, model (1) 
becomes 

y = Za +xp +e, (2) 
where e follows N(0, £)• 

Let yj = t-^l^y, XT = t^^l^x, and Zj = E"^^^Z. Then 
model (2) is equivalent to 



where Sj follows N(0, /). The score test statistic under 
model (3) to test null hypothesis Hq : yS = 0 is given by 

where y* and x* are the residuals under models 
Yt = Zja + St and xj = Zja + sj, respectively, and 

O = y*" — iXxmi ...tX\xni • • • i Xnmi Xnm) ' 

X^, is the residuals under the model X^r = Zja + sj, 

, ^ ^ , , w'^X*'^y*y*'^X*w 

and X* = {Xl,... ,X*^). Then Tscore - 

where a = X*^X*- 



One potential problem with the score test Tjcorg is that 
for genotype data of rare variants, it will be problematic 
to use A to estimate the covariance matrix because of 
sparse data. Following Pan [12] and Sha et al [5], we 
replace A by Aq = diag{A). Then, the score test statistic 

• 1 . . ^ . ^ w^X*^y*y*'^X*w 
is equivalent to Tq[w) = — 

iv'^Aqw 

As a function of w, To{w) reaches its maximum when 
u) = uf = Aq ^X*^y* and the maximum value of To[w) is 
y*^X*A^^X*^y*. Based on longitudinal data, we define 
the statistic to test the effect of the optimally weighted 

combination (L-TOW) of variants, w" x- , as 

T ,,*'rv* A-i v*r,,* u ^mJ 

Th-Txm = Y X Aq X y =2^ t • 

m=l m m 

We use a permutation test to evaluate the p-va\\ie of 
Tl-tow- In each permutation, we randomly shuffle the 
elements of y*. 

Results 

We chose 157 genetically unrelated individuals from the 
file UNREL.txt. These individuals were extracted from 
20 pedigrees in GAW18. We extracted genotypes for 
those individuals from files named chrN-dose.csv.gz. 
These files provided the estimated number of minor 
alleles carried for each variant. We used 200 replicates 
of simulated phenotype data in files PHEN.#.csv, where 
# is replicate number 1 to 200. Sex, age, medication use, 
and tobacco smoking were considered as covariates in 
this study. The phenotype data have been simulated at 
three time points with no missing data. There are 15 
individuals without phenotype values in the simulated 
phenotype data, so the actual number of individuals 
used in this study is 142. To get reasonable powers for 
the power comparison, we merged 2 replicates to form 
a new replicate, so the total number of replicates for 
power comparison in this study was 100. We know the 
answers of the simulated data set in this study. 

There are 2 related phenotypes, systolic blood pres- 
sure (SBP) and diastolic blood pressure (DBP) at three 
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time points. Based on the 2 related phenotypes, we con- 
sider 4 phenotype measurements: SBP, DBP, the first 
principal component of SBP and DBP, and the summa- 
tion of SBP and DBP. For each phenotype measurement, 
we consider five tests: (a) L-TOW, which uses pheno- 
type measurements at three time points; (b) TOW-1, 
TOW based on phenotype measurement at the first 
time point; (c) TOW-2, TOW based on phenotype mea- 
surement at the second time point; (d) TOW-3, TOW 
based on phenotype measurement at the third time 
point; and (e) TOW- Ave, TOW based on the average 
phenotype measurements over three time points. Based 
on each of the 4 phenotype measurements, we compare 
the power of L-TOW, TOW-Ave, and TOW-Single 
(average power of TOW-1, TOW-2, and TOW-3) to 
detect association between each of the top 17 genes that 
influence only DBP, only SBP, or both DBP and SBP. 
The power comparisons based on phenotype measure- 
ment DBP are given in Figure 1. This figure shows that 
in 15 of 17 genes, L-TOW is the most powerful test, 
TOW-Ave is the second most powerful test, and TOW- 
Single is the least powerful one. Power comparisons 
based on other three phenotype measurements show 
similar patterns. (Results are not showed.) 

We also evaluated type I error rates of the proposed 
test, L-TOW. To evaluate the type I error we chose 200 
blocks (100 variants in each block) from chromosome 
21 that are far from causal variants. In each block, we 
applied L-TOW to each of the 100 replicates to test 
association between genotypes and the trait SBP. We 
obtained one p-vahie for each replicate and each block. 



The histogram of the 20,000 p-vahies is given in Figure 2. 
This figure shows that the distribution of /^-values is very 
close to the uniform distribution, which indicates that 
L-TOW has correct type I error. 

Discussion 

We have developed TOW to detect association of rare 
and common variants [5]. Because the GAW18 data set 
provided phenotype measurements at multiple time 
points, similar to most of the existing methods for rare- 
variant association studies, TOW can only be applied to 
this data set by either using phenotype measurement at a 
single time point or using the average phenotype mea- 
surements over all time points. It is likely that a method 
jointly modeling phenotype measurements at all time 
points may increase power. This motivated us to extend 
our previously developed method, TOW, from phenotype 
measurement at one time point to phenotype measure- 
ments at multiple time points. By applying our newly 
developed method L-TOW to the GAW18 simulated 
data set, we showed that L-TOW has increased power 
over TOW by using either phenotype measurement at 
one time point or average phenotype measurements over 
multiple time points. 

Although we describe our method using unrelated 
individuals, it is not difficult to extend the method to 
family-based data. For family data, denote {Xiji, x^m) 
as genotypic score of the jth member in the ith family 

M 

and Xij = 'Y^WmXijm- Let Yijh and Zyt = (zyti, ■ ■ - .ZijkpY 

m=l 
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Figure 1 Power comparisons of the three tests using diastolic blood pressure as a phenotype measurement. Power of TOW-Single is the 
average power of TOW-1, TOW-2, and TOW-3. Numbers 1 to 17 on tine x-axis refer to genes ZNF443, ABTBl, FLNB SLC35E2, TNN, CGN, ZFP37, 
LRP8, RAIl, ZNF544, LEPR, MTRR NRFl, REPINl, PTFGIIP, FITS, and MAP4, respectively. TOW, statistic to test the effect of the optimally weighted 
combination. 
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P-Values 

Figure 2 Histogram of p-values. Two hundred blocks (100 variants in 
each blocl<) that are far from causal variants in chromosome 21 are 
chosen. In each block, the statistic to test the effect of the optimally 
weighted combination (L-TOW) is applied to each of the 100 replicates 
to test association between genotypes and the trait systolic blood 
pressure. One p-value is obtained for each replicate and each block. 
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denote the trait values and the covariates of the /th 
member in the /th family at the kth time point. For 
family data, we can use the following mixed linear 
model Yyk = Zj^f^a + fiXy + Uy + Vyk + e^u, 

where My is a random variable modeling the correla- 
tion between family members, ^ijk is a random variable 
modeling the correlation of phenotype measurements 
between time points, and is a random error term. 
Based on this model, using a similar argument to that in 
the Methods section, we can test association between 
the phenotype and the genomic region. 

Comparing our method with functional mapping, 
whereas our proposed method uses age as a covariate and 
uses a single parameter p as the average effect over time of 
genotypes after adjusting for age effects, functional map- 
ping uses mathematical models to connect gene actions 
and growth or development of a trait. Our proposed 
method has fewer parameters than the functional mapping 
method and uses less information. Our proposed method 
can easily incorporate the combination of rare variants. 
Incorporating the combination of rare variants to func- 
tional mapping requires further investigation. 

Conclusions 

We propose a novel method to test rare-variant associa- 
tion with longitudinal phenotypes by extending TOW, 
our previously proposed method. Application to the 
GAW18 data set shows that the newly proposed method 
jointly modeling phenotype measurements at all time 
points has increased power over TOW, which uses only 
one phenotype measurement. 
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